N-gram Language Models and POS Distribution for the Identification of Spanish Varieties (Ngrammes et Traits Morphosyntaxiques pour la Identification de Variétés de l'Espagnol) [in French]
نویسندگان
چکیده
This article presents supervised computational methods for the identification of Spanish varieties. The features used for this task were the classical character and word n-gram language models as well as POS and morphological information. The use of these features is to our knowledge new and we aim to explore the extent to which it is possible to identify language varieties solely based on grammatical differences. Four journalistic corpora from different countries were used in these experiments : Spain, Argentina, Mexico and Peru. MOTS-CLÉS : classification automatique, ngrammes, espagnol, variétés nationales.
منابع مشابه
Isolation, Identification and Screening of Saharan Actinomycete Strain Streptomyces fimbriatus AC31 Endowed with Antimicrobial Activity
The increasing global public health concern of antimicrobial resistance (AMR) necessitates exploration of natural antimicrobial agents as potential alternatives. This study aims to investigate antimicrobial activities of Saharan actinomycetes, with specific focus on the strain Streptomyces fimbriatus AC31, that holds promising potential as an alternative to combat AMR. In this context, 32 actin...
متن کاملGenetic erosion of traditional varieties of vegetable crops in Europe: tomato cultivation in Valencia (Spain) as a case Study
Ever since the arrival of the tomato to Spain in the 16th century, great diversification of the crop has taken place, giving rise to a rich collection of varietal types. The ‘Comunidad Valenciana’, with its deep-rooted agricultural tradition, is one of the Spanish regions with the greatest diversity in traditional tomato varieties, characterised by their local adaptation and high fruit quality....
متن کاملفایل کامل مجلّه مطالعات زبان فرانسه دو فصلنامه علمی پژوهشی زبان فرانسه دانشکده زبانهای خارجی دانشگاه اصفهان
Tâ ÇÉÅ wx W|xâ Revue des Études de la Langue Française Revue semestrielle de la Faculté des Langues Étrangères de l'Université d'Ispahan Cinquième année, N° 8 Printemps-Eté 2013, ISSN 2008- 6571 ISSN électronique 2322-469X Cette revue est indexée dans: Ulrichsweb: global serials directory http://ulrichsweb.serialssolutions.com Doaj: Directory of Open Access Journals http://www.doaj.org ...
متن کاملExtraction and representation of support verb constructions in Spanish (Extraction et représentation des constructions à verbe support en espagnol) [in French]
Résumé. Le traitement informatique de constructions à verbe support (prendre une photo, faire une présentation) est une tâche difficile en TAL. Cela est également vrai en espagnol, où ces constructions sont fréquentes dans les textes, mais ne font pas souvent partie des lexiques exploitables par une machine. Notre objectif est d'extraire des constructions à verbe support à partir d’un très gran...
متن کاملExternal Lexical Information for Multilingual Part-of-Speech Tagging
Morphosyntactic lexicons and word vector representations have both proven useful for improving the accuracy of statistical part-of-speech taggers. Here we compare the performances of four systems on datasets covering 16 languages, two of these systems being feature-based (MEMMs and CRFs) and two of them being neural-based (bi-LSTMs). We show that, on average, all four approaches perform similar...
متن کامل